Using Early-Stopping to Avoid Overfitting in Wrapper-Based Feature Selection Employing Stochastic Search
نویسنده
چکیده
It is acknowledged that overfitting can occur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is related to the intensity of the search algorithm used during this process. In this paper we show that two stochastic search techniques (Simulated Annealing and Genetic Algorithms) that can be used for wrapper-based feature selection are susceptible to overfitting in this way. However, because of their stochastic nature, these algorithms can be stopped early to prevent overfitting. We present a framework that implements earlystopping for both of these stochastic search techniques and we show that this is successful in reducing the effects of overfitting and in increasing generalisation accuracy in most cases.
منابع مشابه
Using Early Stopping to Reduce Overfitting in Wrapper-Based Feature Weighting
It is acknowledged that overfitting can occur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is related to the intensity of the search algorithm used during this process. We demonstrate that the problem of overfitting in feature weighting can be exacerbated if the feature weighting ...
متن کاملOverfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets
In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm...
متن کاملFuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection
Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملFeature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology
In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a black box. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwise variants can be viewed as simple hill-climbi...
متن کامل